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PREFACE 

This  Quarterly  Proqress  Report  (QPR)  marks  the  first  QPR  of 
a new  speech  contract  separate  from  the  larger  ARPA  contract 
(DAHC15-71-C-0088)  under  which  speech  research  has  formerly  been 
performed . 

Since  we  will  now  be  issuing  QPR's  under  a new  contract  and 
for  a new  contract  monitor,  I would  like  to  adopt  a slightly 
different  editorial  policy  from  that  which  we  have  followed  in 
the  oast.  Each  QPR  in  the  new  series  will  consist  of  two  ■arts 
— a brief  S”rvey  of  Proqress  containing  a few  paragraphs 
lescribinq  the  maior  progress  in  the  individual  components  of  the 
project,  and  a Technical  Notes  section  containing  detailed 
specifications  of  experiments  performed,  oroqrams  implemented, 
design  studies,  and,  where  appropriate,  suonorting  data  and 
appendices.  The  Technical  Notes  will  aspire  to  Publication 
duality  although  they  will  in  qeneral  assume  knowledge  of  the 
continuity  of  the  project  and  thus  lack  much  of  the  introductory 
material  which  would  be  preser t in  a self  contained  publication. 
They  may  also  include  appendices  and  tables  of  supDorting  data  in 
excess  of  that  which  would  be  permitted  in  most  journal 
publications.  It  is  hooed  that  the  Technical  Papers  section  will 
serve  as  an  archive  of  information  which  has  been  discovered  in 
the  course  of  the  project  that  will  be  of  use  for  other 
researchers . 
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The  Manaqinq  Editor  for  the  new  QPR  series  is  Ms. 
Nash-Webber . 


W.A.  Woods 
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I.  PROGRESS  OVERVIEWS 
A.  Acoustic-Phonetics 

During  the  past  months,  we  have  been  putting  a great  deal  of 
effort  into  the  desiqn  and  preliminary  imolementation  of 
parameter-based  segmentation  anu  labelinq  strategies.  These  have 
been  based  on  intuitions  developed  throuqh  a continuing  series  of 
organized  parameter  reading  sessions,  which  have  had  the 
additional  benefit  of  providing  us  with  segment  lattices  for  use 
in  lexical  retrieval  experiments.  Section  II  of  this  repoit  will 
present  a summarv  of  these  sessions  and  their  results  to  date. 
It  will  also  describe  the  preliminary  segmentation  programs  which 
have  been  based  on  these  results  and  the  improvements  to  our 
labeling  algorithms  also  deriving  from  them. 

<3.  Lexical  Retrieval 

Recent  work  on  the  lexical  retrieval  component  has  consisted 
of  the  formulation,  implementation,  and  extension  of  our  scoring 
philosophy  and  lexical  lookup  procedure,  along  with  corresponding 
work  cn  our  lexicon.  The  scoring  ohilosoohv  is  based  on  Bavesian 
analysis  and  involves  finding  the  most  probable  utterance  and 
pronunciation  model  for  a given  acoustic  waveform.  The  new 
lexical  lookup  procedure,  designed  to  handle  alternate 
pronunciations,  segmentation  errors  and  boundary  effects  in  a 
fast  and  efficient  way,  has  resulted  in  a restructuring  of  the 
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lexicon  into  a tree  format.  Section  III  of  this  reoort  presents 
the  arguments  for  and  details  of  the  work  done  in  these  areas. 

C.  Verification 

Work  in  word  verificatio.  during  the  past  quarter  has  been 
directed  towards  improving  the  quality  of  the  synthesis.  In 
order  to  give  the  Verification  component  access  to  phonological 
knowledge,  we  have  broadened  the  ohonetic  context  in  which  the 
ohonet  ic-to-accust  ic  parameter  conversion  is  done  *-0  include  a 
generative  phonological  mechanism.  This  was  done  by  embedding 
the  conversion  program  in  the  Bobr ow-Fr aser  rule  tester  formalism 
! .1  ] and  adding  features  with  associated  numerical  values  to  the 
existing  set  of  binorv  distinctive  features.  In  addition,  we 
have  worked  on  expanding  and  modifying  the  set  of  phonological 
ru’es  to  deal  with  those  new  features. 

In  order  to  iudj^  the  quality  of  synthesis  programs,  we  have 
also  implemented  a waveform  synthesizer  which  accepts  a 
parametric  l ecr esent at  ion  as  input.  This  component  is  currently 
being  t-.sted  using  parameters  mechanically  extracted  from  actual 
utterances.  These  will  serve  as  benchmarks  against  which  we  can 
compare  the  Parametric  outnut  of  succ  e s s i v e implementations  of 
the  rule-driven  phonet ic- to-acoustic  synthesizer. 
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0.  Hardware 

Durinq  the  oast  auarter,  we  took  delivery  of  two  small 
computer  systems  which  will  form  a siqnal  processinq  facility  for 
both  the  sneech  understanding  and  speech  compression  projects,  a 
DFC  PDP11/40  and  a Siqnal  Processinq  Systems  Inc.  SPS-41  (which 
was  purchased  cn  a previous  contract)  . 

The  PDP.U  has  32K  of  parity  cere  memory  and  memory 
management  and  extended  arithmetic  options.  This  is  augmented  by 
24K  of  Standard  Memories  core,  8K  of  semiconductor  memory  shared 
with  the  SPS-41,  and  a Telefile  DC16H/CD213  disk.  The  disk 
sv item  has  a caoacitv  of  30  million  words;  although  it  is 
moving-head,  it  is  sufficiently  fast  to  support  spooling  to  and 
from  the  A/D  and  D/A  interface  at  rates  in  excess  of  20,000 
samples  ner  second.  Our  IMLAC  graphics  system  will  also  be 
connectible  directly  to  the  PDPli. 

The  SPS-41  signal  processor  is  connected  to  the  PDPli 
UNI3US;  its  most  efficient  data  communication  path  with  the  PDPli 
is  via  the  BK  semiconductor  shared  memory  mentioned  above.  The 
SPS-41  contains  an  I nout -Output  Processor  of  exceot ional 
versatility,  so  our  machine  has  dual  12-bit  A/D  and  D/A 
converters,  together  with  the  necessary  clock  hardware  installed 
there.  In  addition  to  olaying  out  sampled  signals,  the  D/A's  are 
also  valuable  debugging  aids,  for  they  may  be  used  to  drive 
oscilloscope  displays  of  data  buffers  in  the  SPS-41  at  various 
stages  of  signal  processing. 
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We  do  not  vet  have  an  ARPANET  interlace  for  i he  PDPll,  and 
wo  are  stil  awuitino  imnlerr.entat  ion  and  documentation  of  the 
virtual  memory  ELF  operating  system  for  the  PDPll.  The  lack  of  a 
filo  system  in  ELF  is  also  an  obstacle;  we  will  probably  have  to 
implement  a temporary  one  in  user  code  for  use  until  one  is 
developed  for  the  ELF  system. 


E . Syntax 


The  grammar  for  the  syntactic  component  has  been  both 
expanded  to  include  a sub-grammar  for  oarsing  date  expressions 
and  hand  verb-particle  constructions,  and  also  simplified  to 
i i d ’’■ore  easily  i nteruretabl*  structure-  f r previously 


accented  utterances. 


A S VS ter 

Of  wriuhtn 

on  the 

arcs 

to 

t la- 

developed  to 

allow  thf 

parser 

to  f 

tore 

pa  t 

’*e,  another  in  order  to  choose  th*  best 
xt  ns  ion . Experimentation  i ~ underway  to 
fixtures  of  depth  first:  and  breadth  first 

w icn  are  now  availaftl  to  the  Parser. 


a urn--  ar  has  been 
* oaths  relative  to 
st  of  oaths  tor 
explore  the  various 
parsing  strategies 


Section  IV  of 


r k . 


report  gives  details  and  examples  ot  this 
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F.  Semantics 

As  well  as  continuinq  in  the  construction  of  a semantic 
network  to  represent  the  conceptual  structure  underlying  the 
travel  budget,  management  lexicon  and  in  the  parallel  development 
of  functions  for  semantic  theory  huildinq  which  understand 
•additional  tvnes  of  semantic  network  relat  ionships , work  on 
semantics  has  been  directed  to  the  extension  and  improvement  of 
toe  basic  network  formalism.  The  result  has  been  the  production 
of  a set  of  general  semantic  network  utility  packages  for 
creating,  editting,  accessing,  printing  and  merginq  semantic 
networks.  This  will  be  described  in  more  detail  in  Section  V. 

G.  Pragmatics 

The  Pragmatics  component  is  currently  neinq  developed  to 
n^rfor^  four  functions:  to  complete  the  interpretation  of  a 

theorv  on  the  basis  of  pragmatic  information,  to  evaluate  a thus 
completed  interpretation,  to  make  suggestions  to  semantics  and 
syntax  and  to  execute  a complete  utterance  interpretation.  In 
all  cases,  procedures  are  involved  to  apply  knowledqe  about  the 

f 

discourse  and  the  intention  of  the  speaker. [2] 

Th^  inout  to  Pragmatics  is  a theory  word  list,  a partially 
instantiated  case  frame  token  from  Semantics  olus  the 
corresponding  structure  from  syntax.  The  first  procedure 
! INSTANCE-MAP)  determines  likely  intentions  on  the  basis  of  words 
in  th"  theory,  e.g.  a simple  declarative  statement  probably 
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implies  "add  new  information"  or  "edit  old  information".  The 
next  procedure  (MODE-STATUS)  suggests  possible  intentions  on  the 
basis  of  the  discourse  structure.  The  resulting  intentions  are 
then  combined  ~.nd  used  by  the  procedure  REIFY  to  fill  in  ellipsis 
and  resolve  anaphoric  references.  Completing  an  interpretation 
also  involves  performing  quantifier  scoping.  This  is  done  by 
LI FT-QUANV  which  moves  inner  quantifiers  to  the  outermost 
position.  Finally,  either  EXECUTE  or  EVALUATE  is  called  on  the 
completed  interpretation,  EXEv.UTing  an  interpretation  may  add, 
delete,  or  change  the  data  base;  or  retrieve  information. 
Accordingly,  data  base  operations  are  directly  under  the  control 
of  Praqmat: es . EVALUATinq  an  interpretation  results  in  a list  of 
case-score-suqqestion  triples  tor  each  case  in  the  case  frame 
taken,  as  well  as  a score  and  suggestion  list  for  the  token  as  a 
whole . 

Scores  aic  discrete  valued  indicators  of  tne  likelihood  of 
either  a particular  case  filler  or  case  frame  liken.  Suggestions 
are  either  substitutes  for  unlikely  case  fillers,  oroposals  for 
likely  ones,  or  higher  concents  in  which  the  concept  expressed  by 
the  given  case  frame  token  may  be  embedded.  E.c.  if  a tr;p 
description  refers  to  an  existing  trip,  then  "edit"  is  a likely 
higher  concent. 

These  r.  ocedures  are  currently  under  development  and  will  be 
described  in  detail  in  sues  edim  QPRs. 
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II. 


ACOUSTTT  SEGMENTATION  AND  LABELING 


A. 

During  the  past  auarter  we  have  expanded  our  data  base  of 
sentences  related  to  the  travel  budget  task.  We  now  have  47 
digitized  utterances  on  line  (from  3 male  speakers)  taken  from  a 
list  af  27  sentences  (See  Appendix  A).  Twenty  of  these 
utterances  have  been  carefully  hand  labeled,  guided  oy 
parameters,  spectrograms,  time-waveforms,  and  original  analog 
recordings.  An  ideal  hand  labeling  indicates 


1)  The  time  (in  100  microsecond  units)  of  the  beginning  of 
each  phonetic  element,  each  word  (marked  bv  "/"),  and 
each  syllable  (marked  as 

2)  The  silent  period  (SI)  and  the  burst  and  asDiration  >f 
nlosives  (See  ADDendix  B) . 

3)  The  stress  levels  assiqned  to  vowels  ( 0=unstressed , 
l=secondarv  stress,  2=primary  stress). 


These  utterances  are  being  used  to  test  our  acoustic 
segmentation  and  labeling  strategies.  In  addition  our  statistics 
gathering  program  is  using  them  to  generate  quantitative 
statistical  measures  for  later  use  by  the  segmentation  and 
labeling  programs.  The  existence  of  such  a data  base  is  crucial 


to  the 

successful 

development 

of 

advanced 

segmentation 

label  ing 

algor ithms, 

and  we  plan 

to 

conti nue 

its  expansion 

par al lei 

with  the  development  of 

our 

acoustic 

analysis  module 
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B.  Parameter  Reading 

In  order  to  qain  insiqht  into  seqmentation  strategies,  we 
have  been  holdinq  orqanized  Darameter  readinq  sessions  over  the 
last  month,  similar  to  earlier  soectroqram  readinq  sessions. 
With  no  a H£iori  knowledqe  of  the  content  of  an  utterance,  the 
readers  use  a number  of  enerqv  related  Darameters  to  seqment  the 
data  into  major  cateqories:  sonorants,  vowels,  strident  or  weak 
fricatives,  and  plosives.  (See  Figure  1 for  a olot  of  these 
Darameters  for  a sample  sentence.)  Obstruents  are  also  classified 
os  voiced  or  unvoiced  if  possible.  Poles  from  the  linear 
Dedication  analysis  of  the  utterance  are  then  used  to  determine 
vowel  and  consonant  identities  based  on  steady  state  and 
transitional  values,  respectively.  Five  samole  seqment  lattices 
resulting  from  the  blind  readinq  are  shown  clotted  above  the 
ideal  t r anscr iDt ion  in  Figures  2a-2e. 


Our  exnerience  to  date  on  parameter  reading  can  be 
summarized  as  follows: 


a)  Hut.  •'n  seqmentation  error  is  less  than  one  percent, 
taking  into  account  Phonological  variations  which  will 
exist  in  the  lexicon. 

b)  The  resulting  seqment  lattice  is  never  more  than  two 
deep,  and  alternate  oaths  occur  on  the  averaqe,  twice 
oer  utterance.  (An  averaqe  utterance  consists  of  25 
segments. ) 

c)  Because  we  were  readinq  parameters,  rather  than 

soectroqr ams,  we  believe  our  str  eqies  can  be 

implemented  with  computer  programs  with  relative  ease. 
In  fact,  we  believe  that  we  can  develop  an  acoustic 
segmenter  to  mimic  human  parameter  readinq  well  enouqh 
to  yield  comparable  performance. 
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d)  Formant  targets,  transitions  and  context  dependent 
acoustic  Dhonetic  knowledge  were  used  extensively.  A 
successful  labeler  must  be  able  to  incorporate  such 
knowledge . 

(A  further  benefit  of  these  parameter  reading  sessions  has  been 
the  resulting  segment  lattices,  which  are  being  used  in  lexical 
retrieval  experiments.  Also,  performance  analysis  after  the 
"blind"  reading  experiment  results  in  a correct  ideal 

transcription  based  on  the  lexical  identify  of  the  utterance. 
This  is  taken  as  the  standard  of  correctness  for  all  segmentation 
and  labeling  experiments  and  used  in  the  data  bast  for 
statistical  measurements  for  the  computation  of  lexical  scores. 
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C.  Preliminary  Segmentation  Programs 

In  an  attemnt  to  simulate  the  preliminary  phase  of  the 
segmentation  performed  by  human  parameter  readers,  we  developed  a 
program  which  looks  for  boundaries  between  sonorar.t  and  obstruent 
scouences  using  the  parameter  LFE  (low  freauency  enerqy  from 
120-440  Hz.).  The  global  level  of  LFE  fluctuates,  usually 
decreasing,  over  an  utterance,  and  obstruents  frenuently  exhibit 
small  but  noticeable  dips  which  have  fairly  high  minima. 
Conseauentl y.  a general  dip  detector  with  several  variable 
parameters  was  developed  for  looking  at  curves,  and  detecting 
hoc  and  plateaus  adiacent.to  dins.  Its  purpose  is  to  locate  all 
obstruents  which  have  low  energy  in  the  low  freauencies.  This 
includes  all  unvoiced  sounds,  most  occurrences  of  voiced 
plosives,  all  strident  fricatives,  and  most  occurrences  of  (V, 
OH,  HH,  OX].  (There  are  times  when  these  latter  obstruents  occur 
between  vowels  that  there  is  ro  dio  in  LFE;  however,  at  these 
times  a large  decrease  in  energy  in  the  higher  freauencies  can  be 
noted . ) 

In  the  first  test  run  of  this  Dr oar am  on  37  utterances 
(spoken  bv  3 male  speakers,  tor  a total  of  1145  Phonetic  segments 
in  357  sonotant  senuences  and  383  obstruent  seauences) , there 
wore  Id  places  where  errors  were  made  and  incorrect  dios  were 
found.  Six  of  these  were  in  the  last  syllable  of  the  utterance, 
where  amplitude  and  fundamental  freauency  drop  off;  seven 
occurred  in  the  8 sentences  sooken  bv  a speaker  with  very  low 
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fundamental  freauency  (WAW) . What  follows  is  a more  detailed 
discussion  of  the  cause  of  these  errors. 

Since  our  analysis  is  not  pitch-synchronous,  the  energy  in 
the  low  frequencies  can  fluctuate  rapidly  when  the  pitch  period 
becomes  greater  than  10  msec  for  a 20  msec  analysis  window.  We 
have  soent  some  effort  trying  to  distinguish  these  fluctuations 
from  those  due  to  voiced  plosives  or  weak  fricatives.  We  are 
currently  invescioatina  the  use  of  a zero-phase  unit-gain  filter 
to  smooth  out  these  effects.  Unfortunately,  this  filter  also 
eliminates  tome  of  the  di js  which  should  be  found,  but  we  hope  to 
be  .able  to  find  a reliable  and  computationally  reasonable 
procedure  for  detecting  this  condition  and  eliminating  this 
source  of  error. 

In  a second  test  run,  the  threshold  for  dios  was  increased 
sliqhtlv,  to  eliminate  falsely  detected  dios,  with  the  result 
that  several  of  the  correct  dios  were  also  missed.  However,  when 
this  threshold  was  combined  with  the  oriqinal  one  and  dios  four.d 
by  onlv  the  lower  threshold  were  treated  as  optional,  only  3 
errors  remained.  That  is,  7 of  the  incorrectly  found  dios  from 
the  first  test  run  were  made  optional.  Hence  in  absence  of  a 
procedure  to  eliminate  the  above  source  of  segmentation  error,  it 
eooears  we  can  deal  wich  most  of  it  by  labeling  Questionable 
segmentations  as  ootional. 

Preliminary  tests  indicate  that  this  orooram  can  also  be 
used  to  find  nasals  and  some  glides  within  sonorant  seciuences 
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(operating 
in  looking 


on  energy  from  640-2800  Hz.), 
at  other  bands  of  energy. 


It  also  may  be  useful 


D . Improvements  to  Statistics  Package 

A module  was  added  to  the  display  routines  of  the  statistics 
package,  enabling  scatter  diagrams  to  be  made  in  3 dimensions. 
With  the  aid  of  reference  lines  and  the  ability  to  rotate  the 
display,  it  is  possible  to  develop  more  complex  decision  spaces. 
It  is  now  also  possible  to  superimpose  any  combination  of  the 
previous  15  scatter  diagrams  or  distributions  (See  Figures  3-5). 
The  program  has  also  been  made  faster  and  more  flexible  to 
improve  interactions.  Searching  20  utterances  for  a prescribed 
context  and  tabulating  the  desired  statistics  takes  less  than  1 
second . 

E.  Acoustic- Phonetic  Algorithms  Developed 

In  reading  parameters,  we  found  that  there  were  some 
segmentation  and  labeling  decisions  which  were  difficult  to  make. 
Therefore,  we  ran  short  experiments  with  th«  statistics  facility 
to  try  to  arrive  at  reasonable  decision  criteria  for  these  cases. 
Several  classification  algorithms  were  also  derived  while 
developing  new  features  of  the  statistics  gathering  facility. 
The  following  is  a list  of  these  difficult  tvpes  of  decisions  and 
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the  criteria  we  set  uo  for  makinq  them. 


L)  For  plosives  followed  by  vowels  (but  not  Dreceded  by 
strident  fricatives) , the  voiced/unvoiced  distinction  was 
made  b',  measuring  a parameter  related  to  voice  onset  time 
(VOT) . Rather  than  using  the  VOT  indicated  in  the  ideal 
labelinq,  this  period  was  determined  by  searching  for  the 
burst  (indicated  by  the  lowest  2nd  derivative  of  enerqy 
after  its  minimum  value)  and  the  beginning  of  the  vowel 
(indicated  by  the  , naximum  derivative  of  energy  after  the 
burst) . This  more  complicated  procedure  was  used  to  ensure 
that  measuring  VOT  automatically  was  possible. 

This  duration  correctly  classified  46  of  the  48 
plosives  examined  as  voiced  or  unvoiced.  Figure  3a  contains 
the  densitv  distributors  for  voiced  (dotted  line)  and 
unvoiced  (solid  line)  plosives.  The  time  scale  is  in  units 
of  13  msec  frames.  The  reqion  of  overlap  indicates  that  8% 
of  the  24  unvoiced  plosives  would  be  incorrectly  classified 
as  voiced,  if  a decision  boundary  were  assiqned  just  below 
3-0  msec.  (In  fact,  our  basic  Dhilosophy  precludes  assigning 
decision  boundaries  whenever  there  is  a nonzero  overlap,  but 
error  rate  is  a good  subjective  measure  of  minimum 

performance . ) The  cumulative  distributions  shown  in  Figure 
3b  with  arid  lines  superimposed  illustrate  another  way  of 
evaluating  the  performance  of  an  algorithm. 

Though  this  performance  is  good,  it  is  felt  that  it  can 
be  improved.  (Both  errors  were  the  result  ot  an  error  in 
locating  the  burst.)  The  time  measures  used  were  rounded  to 
the  nearest  lio  msec,  but  finer  measures  may  improve 
performance.  Also,  dependencies  on  place  of  articulation 
and  the  following  vowel  and  stress  level  were  not 
considered . 
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2)  For  Plosives  followed  by  vowpIs,  the  place  of  articulation 
of  the  plosive  was  determined  using  the  two-pole  freauency 
approximation  to  the  peak  for  the  20  msec  analysis  window 
centered  around  the  burst.  Also  used  is  the  10  msec  chanqe 
in  F3  just  before  the  silence.  (This  could  clearly  be  made 
more  complex.)  Figure  4a  shows  a two  dimensional  view  of  a 
scatter  diaqram  in  the  3 dimensions  described,  rotated  to 
show  maximum  separation  of  classes.  There  are  6 |k)'s,  17 
[t] 's  and  1 l p ] for  speaker  JJW.  Reference  lines  are  drawn 
from  each  data  point  - at  the  lower  left  of  each  label  - to 
the  olane  DF3=0,  to  aid  in  visualization  of  their  relative 
locations  (Fiqure  4b  shows  the  same  plot  without  the 
reference  lines.)  When  the  two-pole  procedure  models  the 
SDectrum  as  2 real  Doles,  the  frequencies  of  the  poles  are 
alwavs  3 and/or  5000  Hz.  For  those  (t)  and  (k)  bursts  which 
have  a pole  at  5000  Hz,  (The  lower  ends  of  their  reference 
lines  form  a straight  line.)  the  other  2 parameters  must  be 
used . 


The  boundary  separating  the  [t]  's  and  [ k ] 's  in  the 
group  on  the  left  might  be  questioned,  since  there  are 
several  samples  of  (t]'s  and  (k)'s  which  are  auite  close  to 
each  other.  Though  one  would  expect  the  freauency  and 
bandwidth  of  a burst  to  be  related,  more  data  should  be  used 
to  verify  this  boundary.  Figure  4c  is  a view  of  the  same 
data  from  the  "toe"  of  Figure  4a.  This  accentuates  the 
group  with  a two-pole  freauency  at  5000  Hz.  It  also  shows 
that  most  ot  the  other  (tj 's  and  |k]'s  are  separable  by 
freauenev  alone.  The  [p]  and  [t]  appear  inseparable,  so 
this  would  cause  one  error  in  a decision  oriented  system. 
Since  more  data  is  needed,  and  burst  characteristics  can  be 
sneaker  dependent,  14  more  samples  were  taken  from  speaker 
DWD.  These  are  displayed  with  the  initial  samples  in 
Figures  4d  and  4e.  Comparison  reveals  that  all  the  samples 
(3)  of  l t ) for  DWD  have  a freauency  of  5000  Hz  and  a 
consistently  lower  bandwidth  than  those  for  speaker  JJW. 
Note  (Figure  4e)  that  the  freauency  during  the  burst  of  a 
l o ) in  un-nr eemphasized  speech  is  low  due  to  the  absence  of 
any  high  freauencies.  (The  burst  freauency  is  around  10-12 
kHz,  much  oast  the  5 kHz  range.)  The  group  of  19  [t]'s  and 
|k)  \s  at  5300  Hz  aDDear  harder  to  be  separated,  but  it  can 
bo  se^n  that,  within  the  plane  of  TPFl=5000,  the  [t]*s  form 
a semi-circle  around  the  (k]'s.  More  data  is  needed.  There 
is  still  only  1 definite  confusion  in  the  3P  samples  - the 
( t 1 with  TPF1=0 . 
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3)  For  strident  fricatives.  (3,SH,Z,ZH],  the  distinction  of 
dental  (S,Z|  vs,  palatal  |SH,ZH1  was  made  using  the 
two-oole-f reauencv  2/3  of  the  way  into  the  fricative.  Out 
of  60  cases  used  in  the  statistics  program,  there  were  2 
errors;  an  (51  followed  by  an  (R)  was  classified  as  (5H] , as 
was  an  (5)  followed  bv  a (¥].  Paying  attention  to  both  the 
transitions  of  the  peak  frequencies  and  the  following 
context  should  eliminate  these  errors. 

4)  Deciding  whether  a dip  in  energy  between  a vowel-like  region 

and  a fricative  region  was  the  normal  dip  expected  or  rather 
an  indication  of  a vowel-plasive-fr icative , 

vowel-plosive-asoir ation  or  vowel-affricate  sequence  was 
done  using  the  depth  of  the  diD  alone.  This  depth  was 
simply  computed  as  the  maximum  value  of  the  energy  in  the 
preemphasized  signal  in  the  vicinity  of  the  dip  minus  the 
minimum  value  Though  this  decision  was  frequently  made 
incorrectly  during  the  human  parameter  reading  experiments 
described  above,  the  programmed  depth  criteria  performed 
quite  well.  Figure  5a  compares  histograms  of  thi  depth 
using  a 5 d9  bin  size  - for  37  cases  of  vowel-fricative 
(dotted  line)  and  50  cases  of  vowel-plosive  (solid  line) . A 
boundary  at  29  d3  would  result  in  3 errors  in  the  87  cases. 
The  cumulative  distributions  are  shown  in  Figure  5b,  along 
with  a third  distribution  (dotted  line  on  the  right)  which 
represents  4 vowel-affricate  seauences  and  10 
vowel-plosive-fr icative  seauences  (included  among  the  50). 
For  anv  samples  which  fall  between  16  and  21  d3  - 14  out  of 

81  do  - there  would  have  to  be  two  segmentation  oaths; 
vowel-fricative  and  vowel-olosive , each  with  a likelihood 
dependent  on  the  actual  depth. 

5)  In  human  parameter  reading  we  also  found  it  difficult  to 
decide  whether  a fricative  region  between  a silence  and  a 
vowel-like  reaion  represented  the  aspiration  due  to  an 
unvoiced  Plosive,  the  heavy  aspiration  due  to  a (T-RJ  or 
[K-R]  cluster,  or  a fricative  between  a Plosive  and  vowel, 

82  examples  were  separated  into  these  three  categories  using 
the  luration  of  the  frication  and  the  maximum  value  of 
energy  from  3400-5000  Hz.  durinq  the  frication.  There  were 
4 errors.  Grouping  ( R ] with  other  vowels  left  only  2 errors 
in  the  2 class  distinction  of  olosive-sonor ant  vs 
olosive-f r icative-sonor ant . 
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Work  on  the  above  distinctions  is  far  from  complete.  Those 
mentioned  are  divert  primarily  as  examples  of  the  type  of 
algorithms  developed  using  the  statistics  facility. 

Richard  Schwartz 
Victor  Zue 
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III.  LEXICAL  RETRIEVAL 

Recent  work  on  the  lexical  retrieval  component  has  consisted 
of  the  formulation,  implementation,  and  extension  of  our  scoring 
philosophy  and  lexical  lookup  procedure,  along  with  corresponding 
work  on  our  lexicon,  Much  of  this  work  resembles  that  done  on 
the  CASPERS  system  UJ,  with  specific  extensions  to  generalize 
the  lookup  component  to  handle  segment  lattices,  probabilistic 
segment  specification,  potential  scoring,  etc.  The  three  areas 
that  will  be  discussed  here  will  be: 

1)  Scoring  Philosophy 

2)  Lexical  Lookup 

3}  Phonetic  and  Phonological  Representation 
A.  Scoring  Philosophy 


Let  be  the  ith  utterance  in  a enumeration  of  all 
acceptable  utterances. 

Let  PH^j  be  the  ith  pronunciation  model  associated  with 
utterance  (i.e.  an  underlying  reoresentation  of  a 
particular  oronunciation  of  the  utterance). 

Let  F ( t)  he  the  acoustic  waveform. 


Our  scoring  philosophy  is  predicated  on  finding  the  most 
probable  utterance  and  pronunciation  model  PM^j , given  the 


waveform  F(t). 


I.e.  Find 


Ui 


and 


PM, 


ij 


such 


that 


P(('J  , PM  ) 1 F ( t ) ) is  maximized.  (This  philosophy  is  discussed  in 
i 3-3 


33 


BBN  Report  No.  3018 


Bolt  Beranek  and  Newman  Inc. 


more  detail  in  [11.)  Using  Bayes  Rule  we  find  that: 

P((Ui,PMij) I F (t) ) * (1) 

P(U. ,PM. .)*P(F(t) I (U. , PM.  *) )/P(F(t) ) 

i 1J  X XJ 

By  writing  the  probability  expression  in  this  way,  we  can 
more  easily  isolate  its  dependence  on  pragmatics,  semantics, 
syntax,  prosodies,  and  phonetics  in  a way  which  is  not  apparent 
in  the  original  expression.  We  do  this  by  noting  that  the  new 
expression  can  be  broken  into  three  different  components: 

The  first  component,  P(U^,PM^j),  can  be  written  as 
p (Ui) *P(PMi j |Ui) , where  P(Ui)  is  the  a priori  probability  that 
utterance  U.  is  spoken,  and  P(PM. . I U . ) is  the  probability  that 

X X j x 

pronunciation  model  PM^  characterizes  the  acoustics,  given  that 
Ui  is  spoken.  The  former  is  determined  largely  by  the  syntax, 
semantics  and  pragmatics  of  the  tasis.  domain,  while  the  latter, 
though  affected  by  them,  is  primarily  a function  of  phonetic  and 
prosodic  information.  (We  are  assuming  that  the  pronunciation 
model  PM„  characterizes  both  the  phonetic  and  prosodic 
information  in  the  acoustic  waveform).  PfPM—lU^)  is  primarily 
determined  by  phonetic  implications  of  through  specific  word 
pronunciations  (and  their  p 'dictable  word  boundary  effects)  and 
seconder’ 1/  by  prosodic  implications  of  the  syntax,  semantics, 
and  pragmatics  of  U^, 
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The  second  component,  ?(F(t)  I (U^PM^) ) , is  the  probability 
thit  the  observed  acoustics,  F(c),  would  have  been  produced, 
given  that  utterance  IK  was  spoken  with  pronunciation  model  PM^. 
Whatever  effect  might  have  had  has  already  been  encoded  in  its 
pronunciation  model  PM^j  and  therefore  may  be  deleted  from  the 
expression.  Hence  P(F(t) I (U^,PM^j) ) becomes  effectively 
P(F(t)  I PM^.j'  which  is  purely  a function  of  acoustics  and  acoustic 
phonetics. 

If  F ( c ) can  be  partitioned  into  segments  which  correspond 

one-to-one  with  the  Phonemes  in  the  model  PM^ ^ , we  see  that 

P(F(t)lPMi>)  can  be  decomoosed  int'-  a product  of  probabilities. 

P(F(t) I PM.  . ) = P(F,  (t) IPM,,)*  (2) 

13  P(Fj(t)|PMij,F1(t))'- 

P(Fn(t)  lP!iij,P1(t)  ..,Fn-j  (t) ) 

where  PMj_j  is  a seouence  of  phonetic  elements  or  phonemes 
Ai  j ( 1)  j ( 2 ) ...  A j_  j ( k ) ...  A^  j \ n ) , 

ana  Ffc(t)  for  k=l,n  is  the  portion  of  the  waveform 
corresponding  to  Aij(k). 

(We  assume  that  the  pronunciation  model  PM.  , has  alreadv 

neen  adjusted  to  reflect  Phonological  effects  (e.g.  at  word 

boundaries),  alternate  word  pronunciations,  Prosodies  etc.)  The 

idea  that  each  phoneme  in  PM. . "produces”  a matching  segment  of 

13 

"(t)  is  implicit  in  our  choice  of  a seauential  pronunciation 
node!.  However,  the  correspondence  between  (k)  and  ( t ) is 

not  mute  as  simple  as  we  would  like. 

Let  PM^-  (k)  be  A,i(k)  in  the  context  of 

A i j ri ) A£  j (2)  A.jtk-l)  and  .^(k+l)  ...  Ai:j(n). 
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Then  equation  (2)  can  be  rewritten  as: 

P ( F ( t)  | PM.  ,)  = P(F1(t)  |PMii(l)  )*  (3) 

J P<F^(t) |PMAj(2) ,Fx{t)) * 

• t • 

P { Fn(  t ) I PM^j(n)  * F 2 ( t ) , . t Pj^_  ^ ( t ) ) 

since  only  ohoneme  A^j(k)  (in  the  context  provided  by 
PM  i j)  is  responsible  for  F^it)  (assuming  correct 
segmentation) . 

The  third  component,  P(F(t))  is  independent  of  particular 
utterances  and  pronunciation  models.  Because  it  is  independent 
it  will  not  affect  the  ultimate  ranking  or  ordering  of  any  two 
theories  spanning  the  whole  waveform.  However,  when  theories  are 
composed  of  word  seouences  which  span  different  portions  of  the 
acoustic  waveform,  the  probability  of  the  waveform  over  these 
different  portions  must  be  known  in  order  to  correctly  rank  each 
of  the  theories.  In  order  to  see  how  each  portion  of  F(t) 
affects  the  value  of  F ( F ( t ) ) , we  note  thit  P(P(t))  can  also  be 
decomposed  into  segment  size  Pieces  as: 

P ( F ( t ) ) = P(Fi(t) )*  {4) 

P(F2(t; !> i (t))* 

...  * 

P(Fn(t)  IF^t)  t...,Fn_l(t)) 

Since  we  will  never  be  able  to  exhaustively  score  every 
possible  utterance,  we  «re  forced  to  search  some  selected  subset 
of  the  acceptable  utterances,  skipping  those  which  appear  to  be 
unlikely.  We  desire  therefore  to  pursue  the  most  likely  theories 
first.  This  will  only  be  possible  if  theories  spanning  different 
portions  of  the  acoustics  can  be  ranked  correctly.  It  might  well 
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be  possible  to  find  such  most  probable  utterances  without  precise 
calculation  of  P ( F ( t ) ) over  different  regions  of  F(t),  but  we 
must  keep  in  mind  the  very  real  possibility  of  having  to  search  a 
space  far  too  large  to  be  practical  or  possible  for  a successful 
real  time  solution  if  such  ranking  is  done  ooorly. 

The  value  of  a scoring  philosophy,  no  matter  how  well 
formulated,  is  for  all  practical  purposes  only  as  good  as  its 
implementation.  We  know  from  experience  though,  that  properly 
motivated  simplifications  can  be  made  which  permit  accurate 
approximations  of  the  scoring  philosophy.  Presently  two 
simol i f icat ions , each  reducing  the  extent  of  the  dependent 
context  indicated  by  the  scoring  philosophy,  appear  to  be  most 
reasonable. 

The  first  simplification  results  from  the  observation  that 
while  each  PM^j(k)  mav  produce  a slightly  different  looking 
acoustic  waveform,  the  significant  waveform  characteristics  can 
oe  accounted  for  if  a sinale  phonetic  element,  A^j(k),  and  a 
small  local  context  is  known.  Every  PM^j  can  now  be  rewritten  in 
terms  of  a finite  sef  of  new  symbols  if  each  Aij(k)  and  its 
relevant  local  context  is  represented  by  a single  symbol  3^j(k). 

I.e.  PM  i j = B i j(  1 ) B i j(  2)  ...  B i j(k)  ...  B i j(n) 
where  Bij{k)  2 PM i j ( K ) ksl,n. 
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Eauation  (3)  can  now  be  rewritten  as: 

P(F ( t) | PM± . ) = P ( Ft ( t) | Bii (1) )*  (5) 

J P ( Fjg  ( t ) I B|  j ( 2 ) ,F1(t))* 

P "( F n < t ) ! j (n)  ,F1(t),...,Fn_1(t)> 

The  second  simplification  results  from  the  observation  that 
the  conditioning  of  the  orobabilities  in  the  above  eauation  on 
the  segments  Ffc(t)  beyond  adjacent  seqments  compensates  for 
theoretical  differences  between  the  influencing  context  expected 
by  B^j(k)  and  the  actual  context  observed.  We  assume,  therefore, 
that  the  only  differences  which  are  meaningful  are  those  which 
occur  during  the  region  of  influencing  context  encoded  by 
(e.g.  one  segment  on  either  side). 

hence: 

P(F(t)|PMij)  = P(Fi(t) lBij(l))*  (6) 

P ( F 2 ( t ) I B i j { 2 ) 

' * 

P ( F n < t ) I Bi j (n) ,Fn^(t)) 

Notice  that  the  probability  still  is  very  much  contextually 
dependent  (i.e.  independence  assumptions  have  been  made  only 
where  independence  is  well  motivated).  The  change  to  j 's  as  a 
means  of  encoding  influencing  contextual  effects  in  a unique 
symbol  provides  an  ef.icient  technique  for  avoiding  the  apparent 
circular  necessity  to  recognize  the  context  of  each  A^j  in  order 
to  correctly  compensate  for  its  effect. 

Much  of  the  analysis  so  far  has  been  concentrated  on  the 
decomDosition  of  p ( (U^ , PM^ j ) I F ( t) ) into  segment  size  pieces  (for 
the  sake  of  clarity  and  ease  of  presentation).  In  out 


38 


8BN  Report  No.  3018 


Bo 1 t Beranek  and  Newman  Inc 


implementation,  however,  this  probability  is  computed  in  word 
size  chunks,  based  on  individual  word  scores  which  are  in  turn 
based  on  the  scores  of  a series  of  the  segments  "matched"  with 
the  B^j's  (contextually  compensated  A^j 's)  of  each  word.  e.g. 
If  in  pronunciation  model  PM^j,  a certain  word  scans  the  k+1  to 
k+m  seqments,  its  pronunciation  model,  WPM,  is, 

Bi;j(k+1)  B±  j ( k+2 ) ...  Bi:j(k+m) 
and  its  score  is  calculated  as  follows: 

n P ( Fk+o  ( t ) I Bi  j * k+o)  , Fjc+o-1  ( t) ) 

Word  Score  = 7f  - (7) 

0=1  P(Fk+0ft) |Fi(t) ,...,Fk+0_1(t)) 

B.  Lexical  Lookup 

We  desire  to  hav®  a lexical  lookup  procedure  which  has  the 
following  capabilities: 

1)  It  permits  consistent  implementation  of  the  scoring 

philosophy. 

2)  It  is  relatively  insensitive  to  random  occurrences  of  noise. 

3)  It  is  ^anable  of  being  extended  to  handle  large 

vocabular ies. 

4)  It  permits  alternate  pronunciations. 

5)  It  handles  missing  and  extra  boundaries  (segmentation 
errors) . 

6 ) It  handles  phonological  word  boundary  effects. 

7)  It  makes  accurate  compensation  in  its  scoring  procedure  for 
effects  due  to  contextual  dependence. 

8)  It  operates  fast  and  efficiently. 

9)  It  can  work  on  selected  portions  of  the  vocabulary  (e.g. 
due  to  syntax  selection  or  word  length  constraints). 
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Durinq  the  past  quarter  implementation  of  a compiler  and 
extension  of  the  lexical  lookup  procedure  used  in  CASPERS  has 
been  accomplished.  What  follows  is  a brief  descriotion  of  how 
the  lexical  lookup  component  works. 

In  a search  of  the  entire  vocabulary,  one  would  not  like  to 
(accidently)  reject  a word  until  it  is  known  that  a acceptably 
high  word  score  can  never  be  achieved  The  fact  that  we  assign 
to  each  word  a score  that  is  built  up  from  the  product  of  the 
scores  of  its  component  segments,  together  with  the  fact  that  it 
is  possible  to  handle  ohonological  word  boundary  effects  in  a 
efficient  manner  if  all  words  beginning  the  same  are  grouped 
together,  strongly  suggest  a tree  structured  vocabulary. 

As  a result  the  lexical  lookup  orocedure  depends  upon  a 
prestructured  vocabulary  tree.  The  purpose  of  the  compiler  is  to 
assemble  a set  of  words  and  word  boundary  rules  into  an 
appropriate  *ree  structure  (2).  Any  path  startinq  at  the  root 
and  traversing  through  the  tree  corresponds  to  the  pronunciation 
of  some  word  in  the  vocabulary.  The  score  calculated  for  the 
oath  is,  in  effect,  the  score  for  the  associated  word.  Note 
however  that  because  manv  paths  are  merged  together  near  the  root 
of  the  tree,  the  total  effort  to  compute  all  such  word  scores  is 
reduced  substantially.  See  Figure  1. 
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Figure  1:  Sample  Tree  Structure 


The  set  of  word  scores  is  computed  using  a stack  of  oaths 
(pointers  into  the  tree  structure).  Each  has  an  associated  score 
for  the  oath  from  the  root  to  that  place  in  the  tree.  The  stack 
is  updated  bv  taking  each  such  oath  and  its  score,  steeping  one 
level  deeper  into  the  tree,  and  scoring  each  subseauent  path 
relative  to  the  old  oath  score.  If  the  score  of  any  particular 
oath  through  the  tree  should  get  sufficiently  poor  relative  to 
other  paths  and  it  is  known  that  pursuing  any  path  in  that 
subtree  could  not  result  in  a score  eaual  to  or  exceeding  the 
minimum  allowed  score  (set  by  a threshold),  the  path  and  the 
subtree  under  it  may  be  thrown  away,  thereby  saving  additional 
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computation.  Thus  both  scoring  and  rejection  are  done  on  sets  of 
words  (on  subtrees)  in  an  efficient  and  satisfying  manner.  Again 
this  technique  is  discussed  in  detail  in  (1). 

C . Phonetic  and  Phonological  Representations  of  the  Lexicon 

In  order  to  begin  some  incremental  simulation  experiments, 
we  have  decided  to  temporarily  fix  the  dictionary  for  the  travel 
budget  management  domain  in  its  current  state  of  approximately 
450  words.  Also  toward  that  end,  we  have  started  to  specify  and 
store  the  phonetic  representation  of  each  of  the  words.  Phonetic 
baseforms  have  been  determined  for  each  word,  and  phonological 
rules  fchich  will  derive  alternate  pronunciations  from  the 
baseforms  have  been  collected.  (Though  most  words  will  have  only 
one  phonetic  baseform,  having  pronunciations  which  cannot  all  be 
predicted  from  a single  basefotm  with  reasonable  phonological 
rules  will  have  more.)  These  alternate  pronunciations  will  be 
included  in  the  lexicon  tree  along  with  the  baseforms,  and 
preliminary  calculations  indicate  that  this  will  result  in  a two- 
to  three-fold  increase  in  its  size. 

To  ensure  the  correctness  of  our  lexical  representations,  we 
olan  to  get  an  outside  evaluation  of  the  correctness  of  these 
baseforms,  phonological  rules,  and  marking  of  syllable  boundaries 
and  stress  levels  from  the  Speech  Communications  Research 
Laboratory.  We  also  will  try  to  determine  quantitative 
information  on  the  relative  likelihood  of  one  pronunciation 
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versus  another.  For  example  the  word  "data"  is  more  often 
pronounced  with  a flapped,  rather  than  an  un-flapoeC  It], 
Quantitative  information  of  this  sort  will  be  incorporated  into 
our  new  lexical  retriever  and  word  verifier. 


John  Klovstad 
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IV.  THE  SYNTACTIC  COMPONENT 


During  the  past  quarter  progress  on  the  syntactic  component 
of  SPEECHLIS  has  been  made  on  both  the  grammar  and  the  parser. 

The  grammar  has  been  extended  to  include  a subgrammar  for 
parsing  date  expressions  which  occur  frequently  in  discourse 
concerning  travel  budgets.  Such  diverse  ways  of  expressina  dates 
as  "July  one",  "One  July",  "July  /.irst",  "Monday,  the  tenth  of 
April,  1975",  and  others  can  now  be  successfully  parsed. 
Extensive  testing  of  the  sentential  complement  facility  of  the 
grammar  was  also  done,  and  sentences  such  as,  "It  costs  four 
hundred  dollars  to  go  to  California",  "Suppose  (that)  the  budqet 
has  five  thousand  dollars",  "I  have  arranged  for  John  to  go  to 
Washington")  can  all  be  parsed  correctly. 

In  order  to  handle  particle  constructions  (e.g.  "Should  a 
new  budget  be  made  up?",  "Can  we  send  him  out  to  California 
before  June?",  "I  need  to  figure  out  how  much  money  I have",  "Add 
the  costs  up") , we  found  it  necessary  to  make  changes  to  the 
dictionary  as  well  as  the  grammar.  These  dictionary  changes 
involved  marking  verbs  which  can  take  particles  and  indicating 
how  the  features  of  the  verb-particle  pair  differ  from  the 
features  of  the  verb  alone.  These  changes  currently  await 
testing.  A list  of  sentences  using  particles  was  sent  to  Wayne 
Lea  at  Univac  for  inclusion  in  an  experiment  to  test  various 
hypotheses  about  prosodic  cues  to  syntactic  structures  since 
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verb-particle  pairs  seem  to  have  very  different  prosodic  contours 
than  regular  verbs  and  prepositions  occurring  together. 

Several  changes  were  also  made  to  the  form  of  structures 
produced  bv  the  Darser.  For  example,  passivization  is  no  longer 
undone.  That  is,  a sentence  like  HThe  money  was  spent  by  John," 
which  was  formerly  Darsed  into  a structure  similar  to  that 
produced  by  “John  spent  the  money",  now  retains  the  money  as  the 
sentential  subject  and  "by  John"  as  a sentence-level 
prepositional  phrase.  The  reason  is  that  a sentence  such  as  "The 
money  was  spent  by  February  cannot  be  similarily  undone,  unless 
some  semantic  or  oragmatic  guidance  is  used  to  oroduce  "Someone 
spent  the  monev  by  February".  It  was  decided  that  the  parser 

t 

should  produce  the  surface  structure  for  passive  utterances  along 
with  an  indication  that  the  oassive  voice  had  been  used,' and  that 
Semantics  would  make  its  case  assignments  taking  the  voice  of  the 
verb  into  account. 

In  all,  aDoroximately  40  sentences  have  been  parsed  with  the 
current  grammar,  and  have  been  found  to  produce  structures  which 
are  amenable  to  semantic  interpretation.  A list  of  many  of  these 
sentences  is  qiven  in  Appendix  C,  and  parsings  for  some  of  them 
are  shown  in  Appendix  D. 

One  change  to  the  grammar  which  was  made  in  order  to  effect 
a significant  change  in  the  parser  was  the  inclusion  of  a weight, 
•'urrently  a small  integer,  as  an  additional  component  of  every 
arc.  This  weight  was  originally  conceived  of  as  a rough  measure 
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of  either  (a)  how  likelv  the  arc  is  to  be  taken  when  the  parser 
is  in  that  state  or  (b)  how  much  information  is  like;ly  to  be 
gained  from  taking  this  arc,  i,e.  how  likely  the  parse  path 
including  this  arc  is  to  be  correct.  That  these  two  schemes  are 
not  equivalent  can  be  seen  by  the  following  example.  In  a given 
state,  say  just  after  the  main  verb  of  the  sentence  has  been 
found,  the  arc  which  accepts  a particle  may  be  much  less  likely 
than  the  arc  which  jumps  to  another  state  to  look  for 
complements.  However  if  a particle  which  agrees  with  the  verb  is 
found  in  the  input  stream  at  this  point,  then  the  particle  arc  is 
more  likely  to  be  correct. 

Since  the  relative  frequency  of  arcs  from  a given  state  is 
already  reflected  to  some  extent  by  their  ordering  within  the 
state,  it  was  decided  that  the  weights  would  be  associated  with 
information  content.  The  actual  weiqht  assigned  to  each  arc 
reflects  an  intuitive,  though  experienced,  guess. 

The  parser  was  modified  to  employ  the  weights  in  the 
following  way.  Each  configuration  created  receives  a score  which 
is  determined  by  the  score  on  the  configuration  preceeding  it  and 
the  weight  on  the  transition  between  them.  In  the  simplest  :ase, 
the  score  of  a new  configuration  is  the  sum  of  the  score  of 
orevious  configurations  and  the  weight  on  the  arc  between  th^m. 
Thus  the  score  on  a configuration  may  be  considered  the  score  of 
the  parse  oath  terminating  on  that  configuration.  If  the  arc  is 
a PUSH  arc,  the  score  of  the  terminating  configuration  also 
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depends  on  the  score  which  is  attached  to  the  constituent  used  by 
the  PUSH  arc.  Thus  if  there  are  several  possible  constituents  in 
a well-formed  substring  table  at  a given  point,  those  which  look 
the  best  will  increase  the  score  of  the  paths  which  use  them. 

The  parser  then  considers  a ret  of  the  highest-weighted 
active  configurations  and  tries  to  extend  each  of  them  in  turn 
before  selecting  a new  set.  In  this  way  some  parallelism  is 
achieved,  less  likely  configurations  are  not  extended,  and  some 
of  the  dangers  of  depth  first  processing  are  avoided. 


Madeleine  Bates 
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V.  SEMNET  - THE  NETWORK  UTILITY  PACKAGE 

In  the  course  of  constructing  the  semantic  network  for 
travel  budget  management,  we  noted  several  facilities  unavailable 
in  our  existing  formalism  and  implementation,  which  nevertheless 
seemed  essential  to  have.  These  included  such  things  as  the 
ability  to  store  information  about  specific  arcs  and  a way  for 
several  people  to  a-operate  on  the  construction  of  the  same 
semantic  network  in  a reasonable  manner.  As  these  seemed  to  be 
of  general  utility,  and  not  confined  to  networks  for  speech 
understanding  research,  extensive  work  was  done  this  auarter  on 
extending  and  improving  our  semantic  network  formalism  and  its 
implementation.  What  follows  is  a description,  albeit  a brief 
one,  of  the  current  network  package,  SEMNET.  Where  features  have 
been  modified  or  extended  from  earlier  versions  of  the  system 
(documented  in  [1,2,3])  those  features  will  oe  n«»;ed,  along  with 
the  reasons  for  the  change. 

A.  Network  Components 

Within  the  SEMNET  formalism,  there  are  three  types  of 
entities  making  up  ' semantic  network:  nodes  links  and  augments 
A node  is  a place  at  which  information  about  a conceptual  entity 
is  collected  and  orq ji  "zed.  A link  is  a directed  association 
either  between  two  nodes,  or  between  a node  and  some  information 
outside  the  network.  A particular  rode->link->node  triple  is 
termed  an  arc , and  an  augment  is  a way  of  associating  both 


48 


BBN  Report  No.  3018 


Bolt  Beranek  and  Newman  Inc 


network  and  extra-network  information  with  one  or  more  arcs. 

Nodes  mav  correspond  to  words,  objects,  events,  etc, 
whatever  one  warts  to  have  treated  as  a unique  conceptual  entity. 
A node  may  either  be  named,  by  associating  with  it  a LISP  print 
name,  or  be  nameless.  Independently,  it  may  oossess  an  "ego" 
which  specifies  the  reason  for  its  existence  as  a separate 
entity.  For  example,  there  may  be  one  node  whose  name  is  "Brick 

1"  and  another  whose  ego  is  "Brick  1 as  the  lintel  of  Arch  1". 

Both  names  and  egos  are  implemented  as  properties  of  a node, 
called  PNAME  and  EGO  respectively  (where  a proper  y is  one  of  two 
types  of  network  links  to  be  discussed  next). 

Nodes  are  connected  to  each  other  in  this  formalism  via 
named  links,  called  relations  if  they  are  two-way  connections  or 
properties  if  the  connexion  is  in  a singli  direction. 

Properties  mav  also  be  used  to  associate  with  a node  information 
outside  the  network,  as  exemplified  by  the  PNAME  and  EGO 

properties  mentioned  above.  The  bi-directionality  of  relations 
is  effected  by  means  of  link  inverses.  That  is,  when  a relation 
link  of  type  R is  established  between  nodes  A and  B,  so  too 
autonatically  is  a link  of  tyoe  R-inverse  between  B and  A.  The 
semantic  network  formalism  has  also  been  extended  to  allow  one  to 
declare  reflexive  relations  like  EQUALS  (i.e.  ones  which  are 
their  own  inverses)  in  order  to  eliminate  redundant  inverses. 
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All  network  links  are  named,  and  each  linkname  has  its  own 
associated  node  in  the  semantic  net.  While  this  may  be  treated 
as  an  invisible  implementation  decision  by  the  network  designer, 
one  may  also  take  advantage  of  it,  as  we  have  in  the  SPEECHLIS 
network,  as  a place  to  specify  facts  true  of  all  arcs  with  a 
given  linkname.  For  example,  it  can  be  used  to  store  the  name  of 
the  relation's  inverse,  such  logical  properties  of  a link  as 
whether  several  arcs  with  that  linkname  entering  a node  should  be 
treated  as  ANDed  or  ORed  and  how  arcs  of  that  type  associate  with 
other  arcs,  etc.  As  will  be  seen  in  the  following  example,  this 
need  not  be  exclusively  meta-information  (i.e.  non-conceptual , 
logical  or  nrobabi 1 istic  data).  As  a result,  the  distinction 
between  "primitive"  links  and  built-up  relations  that  we  had 
Dteviouslv  made,  following  Shapiro  [4],  has  become  blurred.  For 
example,  consider  the  network  fraqment: 

101 

PNAME  STATE 

KINDS  (SOLID) (LIQUID) (GAS) 

NODETYPE  (LINNNAME) 

102 

PNAME  WATER 

FORMS  (Water  as  a solid  103) (Water  as  a gas  104) 

(Water  as  a liouid  105) 

103 

EGO  Water  as  a solid 

STATE  (SOLID) 

FORM/OF  (WATER) 

PNAME  ICE 

Here  STATE  is  both  a conceptual  entity  and  the  name  of  a 
’•tflation.  Its  existence  as  a conceptual  entity  (or  node)  allows 
us  to  specify  explicitly  such  information  as  its  possible  values. 
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As  a link,  it  allows  us  to  say  that  ICE  is  water  in  its  solid 
state.  (Note  in  the  above  example  that  PNAMEs  are  printed  in 
upper  case,  and  EGOs  in  both  upper  and  lower.  The  terminal  nodes 
of  arcs  are  printed  enclosed  in  parentheses.  The  values  of 
property  links  are  printed  out  straiqnt. 

Augments  provide  a way  of  associating  more  information  than 
just  a linkname  with  one  or  more  individual  arcs  in  the  network. 
Augments  resemble  ordinary  nodes,  except  that  they  serve  as  a 
focus  for  information  about  particular  network  arcs  rather  than 
about  conceptual  entities.  Several  arcs  may  have  the  same 
augment,  and  some  arcs,  no  auqment  at  all.  Arcs  lacking  augments 
are  termed  "simole",  while  the  others  are  termed  "augmented". 
The  association  of  augment  and  arc  is  made  explicit  within  the 
network  and  is  effected  via  the  property  AUGMENT/OF.  For 
example , 

12 

APRIORI  .8 

AUGMENT/OF  [conceot  of  soend  14 1 (agt)  (we) 

would  be  an  augment  node  associated  with  the  AGT  link  from  the 
node  14  (whose  ego  is  "conceot  of  spend")  to  the  node  whose  print 
name  is  "we".  The  converse  association  between  the  arc  and  the 
augment  is  built  into  the  internal  mechanism  of  arc 
implementation  and  is  accessible  via  the  function  GETAUG,  to  be 
discussed  in  the  next  section.  What  this  auqment,  together  with 
other  stored  information  about  the  concept  of  spending,  tells  us 
is  that  while  any  person  or  group  of  people  can  be  the  agent  of 
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“spend",  our  estimated  probability  of  its  being  "we"  is  80%. 

The  impetus  to  provide  such  an  augment  capability  was  the 
desire  to  associate  probabilistic  information  with  individual 
network  arcs,  for  example,  the  likelihood  that  concept  A will 
fill  the  AGENT  case  of  some  concept  which  is  tillable  by  concepts 
A,B,C  or  D,  or  the  likeliho-.d  that  some  Darticular  higher  concept 
is  being  discussed  when  word  A is  spoken.  However,  other  A. I. 
projects  at  BBN  have  adopted  this  formalism  and  are  finding  other 
uses  for  these  augments,  such  as  SCHOLAR'S  use  of  them  in 
implementing  I-tags. 

3.  Implementation 

The  actual  data  structure  in  which  a semantic  network  is 
stored  in  the  current  SEMNET  formalism  is  a LISP  arrav,  with  eaca 
node  "orresoonding  to  a single  arrav  element.  A node  is  uniauely 
ident  : red  by  its  position  in  the  array,  e.g.  item  1,  item  2, 
etc,  where  this  integer  is  called  the  node's  SREF  (for  semantic 
referent).  Each  element  of  a LISP  arrav  can  hold  two  LISP 
pointers,  one  of  which  is  used  for  the  list  of  relational  arcs 
leaving  the  node,  the  other  for  the  list  of  properties.  Both  of 
these  lists  are  stored  in  LISP  property  list  format,  a change 
t.om  our  earlier  implementation,  in  order  to  take  advantage  of 
the  CONS  storage  algorithm  (5J. 
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As  before,  all  arcs  with  the  same  linkname  leaving  a node 
are  collapsed  and  stored  together  for  efficiency.  Thus  the  list 
of  relations  and  the  list  of  properties  for  the  node  both  have 
the  same  form,  i.e.: 

(<linknumberl>  (<nodespec>+) 

<linknumber 2>  (<nodespec>+) 

...) 

where  clinknumber 1>  is  the  SREF  of  linkname  1 etc.,  and  a 
<nodespec>  is  either  the  SREF  of  the  node  at  the  other  end  of  the 
link  for  simple  links,  or  a pair  of  SREFs  for  augmented  links. 
In  the  latter  case,  the  first  element  is  the  SREF  of  the  node 
reached  and  the  second,  the  SREF  of  the  augment.  Each  list  of 
nodesoecs  is  sorted  by  the  SREF  of  the  node  reached  to  make  for 
efficient  retrieval. 


C.  Utility  Packages 


Currently  six  files  make  up  the  SEMNET  semantic  netwoi k 
utility  package.  These  are: 

BASIC5EMNET:  functions  for  building  and  accessing  a semantic 
network 

EDITSEMNET:  functions  for  editting  a network 

PRINTSEMNET:  functions  for  printing  a network  in  readable 
format 

MERGESEMNET:  functions  for  merging  two  somewhat  similar 

networks 

UPDATESEMNET:  functions  for  updating  a network  created  in  an 
earlier  version  of  the  formalism. 

UTILSEMNET:  functions  of  general  utility  which  are  used  by  the 
other  SEMNET  packages  and  which  are  not  provided 
for  in  LISP. 
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Most  of  the  top-level  functions  currently  in  BASICSEMNET, 
EDITSEMNET  and  PRINTSEMNET  have  been  well  described  in  [1). 
Changes  made  to  them  to  accomodate  the  new  proplist  format  for 
relations  and  the  institution  of  linknames  as  network  nodes  have 
not  changed  their  appearance  to  the  user.  Only  the  new  augment 
facility  has  produced  changes  and  additions  to  SEMNET  which 
differ  from  the  write-up  in  (1).  These  will  be  discussed  in  the 
next  section,  followed  by  a description  of  MERGESEMNET  and 
UPDATESEMNET.  (Since  UTILSEMNET  just  contains  low-level 
functions,  we  will  not  take  the  time  to  discuss  it  here.) 

D.  The  Augment  Fac  i 1 i ty 

Augments  can  be  specified  in  several  wavs  and  at  several 
times  during  the  construction  of  a network.  One  can  specify  the 
augment:  L)  directly,  as  an  argument  to  such  arc  building 
functions  as  ADDREL,  ICONNECT  and  PUTLINK  (see  [1]  for  a 
description  of  these  and  other  functions  not  described  herein); 
2)  in  a relation  soecif ication  (RELSPEC)  in  a call  to  a 
node-building  functions  like  IBUILD  or  ADDITEM;  or  3)  later,  in  a 
call  to  AUGLINK,  a new  function  which  chanqes  simple  links  into 
augmented  ones. 
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The  form  of  the  augment  information  is  the  same  for  all  of 
the  above: 

<AUGMENT>  NIL  {*  create  a simple  link) 

->  {*  create  an  augmented  link  in  the  forward 
direction.  That  is,  create  a new  node 
(arc  node)  and  set  it  to  point  to  the 
given  link.) 

<-  (*  Do  the  same  for  the  reverse  direction) 

T {*  Make  both  links  augmented.) 

(->  <AUGINFO>+)  {*  Create  a forward  augment 


and  hang  off  it 
AUGINFO.) 

the 

information 

in 

(<- 

<AUGINF0>+)  (*  Do 

reverse  direction.) 

the 

same 

for 

the 

(-> 

0 <NODESPEC> ) (*  Make  the 
in  NODESPEC  the  arc  node.) 

node 

specified 

(<- 

§ <NODESPEC>)  (*  Do 
reverse  direction.) 

the 

same 

for 

the 

((->  <AUGINFO>+) (<-  <AUGINFO>+) ) (*  Augment 

both  forward  and  reverse  links  as 
indicated.  Here  again  <AUGINFO>+  may  be 
replaced  bv  the  seauence  § <NODESPEC>.) 

<AUGINFO>  : = : ( < PEL>  <TERM>)  I (<PROP>  <VALUE>)  (*  i.e. 

a RELSPEC) 

<NODESPEC>  <NODE>  {*  i.e.  an  integer)  I a function 
which  evaluates  to  a node 


There  are  top-level  functions  for  getting  an  augment,  adding 
information  to  an  augment,  editing  an  augment,  deleting 
information  therein,  converting  an  augmented  arc  to  a simole  one, 
and  printing  an  augment.  Note  that  we  have  enabled  only 
relations  and  not  properties  to  be  augmented,  though  should  the 
need  be  felt,  the  facility  could  be  so  extended.  The  following 
describes  both  new  top-level  functions  and  changes  to  existing 
ones  which  enable  augments  to  be  added  and  used. 
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1.  Arc  Buildinq  Functions 


(ADDREL  ITEMA  R ITEMB  AUGMENT)* 

where  ITEMA  and  ITEMB  are  nodes  and  R is  a relation  which 
has  an  inverse.  ADDREL  behaves  as  before  if  AUGMENT  is  NIL. 
That  is,  it  adds  ITEMB  to  the  list  of  nodes  reached  by  following 
R links  from  ITEMA  and  adds  ITEMA  to  the  list  of  K-inverse  links 
leaving  ITEMB.  if  AUGMENT  is  ->,  T,  (->  ...)  or 
((->  ...)(<-  ...)),  it  creates  an  appropriate  arc  node  and  adds 
(ITEMB  . arcnode)  to  the  list  of  R links  leaving  ITEMA.  If 
AUGMENT  is  T,  (c-  ...)  or  ((->  ...)(<-  ...)),  it  again 
creates  an  appropriate  arc  node  and  adds  (ITEMA  . arcnode)  to  the 
list  of  R-inverse  links  leaving  ITEMB.  e.g.  ADDREL { FRUIT 
KINDS  BANANA  (->  (APRIORI  .4))) 


(PUTLINK  ITEMA  R ITEMB  AUGMENT) 

where  ITEMA  and  ITEMB  are  again  nodes  and  R is  a relation. 
PUTLINK  behaves  as  before  if  AUGMENT  is  NIL.  Otherwise,  it  adds 
an  augmented  link  with  the  aporopriate  information.  Note  the 
only  sensible  values  for  AUGMENT  here  are  NIL,  ->,  and  (->  ...), 
since  PUTLINK  only  creates  a link  in  the  forward  direction. 


( ICONNECT  ITEMA  R I'  l ,3  AUGMENT) 

ICONNECT  behaves  just  like  ADDREL,  except  that  ITEMA  and 
ITEM3  can  be  either  pnames  or  forms  that  evaluate  to  a list  of 
nodes. 


2.  Node  Buildinu  Functions 


( 1 BUILD  RELSPEC+) 

where  RELSPEC  (R  ITEM  AUGMENT),  R is  a relation,  and 

ITEM  is  either  a node,  a pname  or  a form  that  evaluates  to  a list 
of  nodes.  IBUILD  behaves  as  before,  except  that  when  the  AUGMENT 
in  a RELSPEC  is  non-NIL,  it  creates  the  appropriate  kind  and 
number  of  augmented  links.  For  example, 

riBUILO  (PNAME  FRUIT) (KINDS  APPLE  ( (->  (APRIORI  .8)) 

(<-  (APRIOFI  .4)))) (KINDS  PEAR  ->) (KINDS  BANANA) 

( KINDS  QUINCE  T) ) 


56 


BBN  Report  No.  3018 


Bolt  Beranek  and  Newman  Inc 


(ADDITEM  ITEMA  (SUPERELSPEC) +) 

where  ITEMA  is  either  a node,  a pname,  or  a form  that 
evaluates  to  a list  of  nodes,  and 

SUPERELSPEC  (R  LINKSPEC+) 

LINKSPEC  ITEM  I (+  ITEM  AUGMENT) 

The  only  difference  between  this  and  the  earlier  version  is  that 
one  can  now  specify  an  augment  in  a linkspec.  If  ITEM  is  a form, 
the  same  AUGMENT  will  be  put  on  the  link  from  ITEMA  to  each  node 
resulting  from  evaluating  ITEMS. 


3.  New  functions 


(ADDAUGINFO  ITEMA  R ITEMB  AUGINFO) 

ADDAUGINFO  adds  further  information  to  the  appropriate  arc 
node.  AUGINFO  is  a list  of  RELSPECS,  as  in  a call  to  I3UILD. 
(Also  see  definition  of  AUGINFO  above.) 

(AUGLINK  ITEMA  R ITEMS  AUGMENT) 


AUGLINK  changes  a simole  link  into  an  augmented  link.  Note 


that  AUGLINK  only  changes  the 
inverse.  The  only  sensible  value 
(->  ...)  . 

(GETAUG  ITEMA  R ITEMS) 

GETAUG  returns  the  augment 
ITEMA  to  ITEMB  via  relation  RR  if 

( IEDITAUGP  ITEMA  R ITEMB) 

IEDITAUGP  allows  one  to  edit 
off  the  arc  node  associated  with 
via  R. 

( IEDITAUGR  ITEMA  R ITEMB) 

IEDITAUGR  allows  one  to  edit 
off  the  arc  node  associated  with 
via  R . 


links  specified,  and  not  its 
of  AUGMENT  then  are  ->  and 


ssociated  with  the  arc  from 
one  exists,  otherwise  NIL. 


the  property  information  hung 
the  particular  link  from  A to  B 


the  relation  information  hung 
the  particular  link  from  A to  3 


(REMAUG  ITEMA  R ITEMB) 


REMAUG  chanaes  the  augmented  link  from  ITEMA  to  ITEMB  into  a 
simole  one.  It  is  the  reverse  of  AUGLINK.  The  abandoned  augment 
is  put  or,  the  FREELIST  for  re-use  if  there  are  no  other  arcs  with 
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which  it  is  associated. 

(DAUG  ITEMA  R ITEMS) 

DAUG,  for  "describe  augment",  prints  out  the  arc  node 
associated  with  the  link  between  ITEMA  and  ITEMB  via  R. 


E.  MERGESEMNET 


MERGESEMNET  is  a package  of  functions  for  merging  two 
"somewhat  similar"  semantic  networks,  thereby  enabling  two  or 
more  people  to  work  independently  on  the  same  semantic  network 
and  later  combine  their  results.  The  merger  is  invoked  by  the 
function  MERGENETS,  whose  two  arguments  name  the  two  files 
containing  the  networks  to  be  merged  and  whose  result  is  a file 
containing  the  merged  network,  e.q.  (MERGENETS  <WARNOCK>MYNET 
< AIELLO>MYNET) . The  following  assumptions  are  made  by  MERGENETS: 


1.  Both  semantic  networks  have  been  made  using  only  the 
functions  in  BAS1CSEMNET  and  EDITSEMNET.  (i.e.  There  are 
no  relations,  nodes,  links,  or  properties  unnatural  to  the 
structure  building  and  modifying  functions  found  in  these 
files. 

2.  Both  networks  have  been  filed  using  the  NET;  macro  found  on 
BASICSEMNET. 

3.  BASICSEMNET  has  been  loaded  into  the  system  in  which  the 
merger  is  being  done.  MERGENETS  also  requires  UTILSEMNET  to 
be  loaded. 

4.  Relations  defined  in  both  networks  have  the  same  definitions 
(i.e.  the  same  inverse),  though  both  networks  need  not  have 
the  same  set  of  relations. 

5.  Networks  may  contain  augmented  as  well  as  simple  links. 
There  is  one  caution  however:  if  the  link  from  node  A to 
node  B is  augmented  in  both  networks,  a message  to  that 
effect  will  be  printed  out  to  the  user,  bet  only  the  augment 
from  the  first  network  (i.e.  the  first  file  name)  will 
appear  in  the  resulting  merged  network.  It  has  been  left  to 
the  user  to  decide  what  should  be  done  with  possibly 
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dissimilar  and/or  conflicting  augments. 

6.  MERGENETS  is  undoable.  However,  the  fact  that  the  files 
containing  the  two  input  semantic  networks  remain  around  and 
untouched  following  the  merger. 

7.  The  file  containing  the  output  of  MERGENETS  will  be  a later 
version  of  the  first  argument  file  to  MERGENETS.  The  output 
of  MERGENETS  remains  in-core  as  well  for  further  additions, 
modifications,  or  disembowelments. 


F.  UPDATES EMNET 


A set  of  functions,  called  by  the  function  UPDATE,  exists 
for  bringinq  semantic  networks  whose  format  reflects  an  older 
version  of  BASICSEMNET  into  the  new  formalism.  It  takes  as  input 
the  name  of  a file  containing  an  old-format  semantic  network  and 
outputs  a new  version  of  that  file  containing  an  up-to-date  net. 
Since  UPDATE  itself  checks  the  form  of  the  incut  network,  no 
further  specifications  need  be  given  bv  the  user  on  what  kinds  of 
updating  must  be  done. 


Bonnie  Nash-Webber 
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Appendix  A 


Digitized  Sentences  for  Travel  Budget  Task 


m 


* # 


i . 


T 

I 

4 # 


100.  Give  me  a list  of  the  remaining  trips  and  their  estimated 
costs . 

101.  What  do  we  have  budgeted  for  the  ACL  meeting? 

102.  What  is  the  total  budget  figure? 

103.  What  trips  have  been  taken  since  January? 

104.  List  all  trips  already  taken. 

105.  Change  the  cost  of  a trip  to  Amherst  to  sixteen  dollars. 

106.  List  all  trips  to  California  this  year. 

107.  How  many  trips  has  Craig  taken? 

108.  what  is  the  round  trip  fare  to  Pittsburgh? 

109.  Is  two  hundred  dollars  enough  for  a four  day  trip  to  New 
York? 

110.  What  is  the  registration  fee? 

111.  When  did  Bill  go  to  Washington? 

112.  I need  to  take  a trip  to  Los  Angeles. 

113.  Is  John  scheduled  to  go  to  Carnegie? 

114.  who  paid  for  my  trip  to  IJCAI? 

115.  Give  me  a breakdown  of  the  expense  to  send  one  person  to 
London. 

116.  Change  the  travel  estimate  to  ten  dollars  for  the  bus. 

117.  The  final  cost  of  the  trip  was  fiftv-six  dollars  and 
sixty-six  cents. 

118.  How  much  did  we  ask  for? 

119.  Who's  going  to  IFIP? 

120.  How  much  do  we  have  left  in  the  budget? 

121.  How  much  does  it  cost  to  send  someone  to  California  for  a 
week? 

122.  Which  conference  is  the  most  expensive? 

123.  I want  to  know  what  trips  Bill  will  take  this  winter. 

124.  Am  I going  anywhere  in  late  November? 

125.  When  is  the  next  ASA  meeting? 

125.  How  much  have  we  already  spent? 

127.  Can  we  afford  an  additional  person  to  the  ASA  meeting  in 
St.  Louis? 
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Appendix  B 

Example  of  Ideal  Hand  Labels 


1760  * 

1760/GxVE 
1760  G 
1900  IH  2 
2920  V 
3250  * 

3250/ME 
3250  M 
3840  IY  1 
5100  * 

5100  'A 
5100  AX 
5470  * 

547  /LIST 
547b  L 
6200  IH  2 
7000  * 

7000  S 
7800  SI 
3160  T 
8440/OF 
8440  AX 
8900  V 
9220  * 

9220/THE 
9220  DH 
9112  AX 
10100  * 
10100/REMAINING 
10100  R 
10520  IY  0 
11360  * 

11360  M 
12020  EY  2 
13380  * 

13380  N 
13680  IH  1 
14380  NX 
14900  * 

14900/TRIPS 
14900  SI 
15300  T 
16400  R 
16750  IH  2 
17550  SI 
18200  P 
13350  * 


18350  S 


19300/AND 

1930G 

EH  1 

20300 

N 

20920 

* 

20920/THEIR 

20920 

DH 

21160 

EH  1 

21780 

* 

21780 

R 

22400/ESTIMATED 

22400 

EH  2 

23370 

* 

23370 

S 

23860 

SI 

2437C 

T 

24670 

IX 

24970 

* 

24970 

M 

25540 

EY  1 

26100 

* 

2610cj 

Y 

2644b 

IX 

27500 

URD 

27900 

* 

27900/COSTS 

.27900 

SI 

28300 

K 

28980 

AO  2 

31200 

S 

32400 

SI 

33170 

T 

33450 

S 

35000/ { END) 
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Appendix  C:  Sample  Sentences 


Some  of  the  Sentences  Parsed  by  the  SPEECHLIS  Parser 


Monday  April  tenth. 

Monday  the  tenth  of  April. 

April  tenth. 

One  July. 

July  one. 

April  one  seventy  five.  (i.e.  April  1,  '75) 

Monday  the  tenth  of  April  nineteen  seventy  five. 

July  one  nineteen  seventy  four. 

Thirty  one  April  seventy  five. 

April  seventy  five. 

April  nineteen  seventy  five. 

April. 

The  tenth  of  April  nineteen  seventy  five. 

When  is  John  going? 

Who  is  going  to  IFIF? 

It  costs  four  hundred  dollars  to  go  to  California. 

I want  John  to  go. 

We  started  to  spend  money. 

I want  to  go. 

Supoose  that  the  budget  has  five  K dollars. 

I have  arranged  for  John  to  go. 

I arranged  that  John  will  go. 

Twenty  one  people. 

The  trips  that  were  taken  in  July. 

Sc  adule  John  a trio  to  California. 

The  budgets  which  have  money. 

Nine  people. 

Which  is  the  biggest  trip? 

Which  conference  is  the  biggest? 

Give  me  a list  of  the  remaining  trips  with  the  estimated 
costs. 

The  trio  was  taken  bv  Bill. 

I want  you  to  cancel  that  trip. 

How  much  did  we  spend? 

The  person  to  whom  I sent  money. 

The  registration  fee  for  that  meeting  is  ferty  dollars. 

Nine  people  will  be  going  to  Pittsburgh  in  April  for  the 
IFIP  conference. 
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Appendix  D:  Sample  Parsinns 

SENTENCE : (APRIL  TEN'f’H) 

37  CONFIGS , 32  TRANS 
S NPU 

NP  DATE  NL*  10 

MONTH  APRIL 


SENTENCE:  (MONDAY  THE  TENTH  OF  APRIL) 
52  CONFIGS,  46  TRANS 
S NPU 

NP  DATE  DAY  MONDAY 
NUM  10 
MONTH  APRIL 


SENTENCE:  (APRIL  ONE  SEVENTY  FIVE) 
56  CONFIGS,  60  TRANS 
S NPU 

NP  DATE  NUM  1 

MONTH  APRIL 
YEAR  75 


SENTENCE:  (THIRTY  ONE  APRIL  NINETEEN  SEVENTY  FIVE) 
115  CONFIGS,  133  TRANS 
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S NPU 

NP  DATE  NUM  31 

MONTH  APRIL 
YEAR  1975 


SENTENCE:  (I  WANT  TO  GO) 

69  CONFIGS,  64  TRANS 
S DCL 
NP  DET 
PRO  I 

FEATS  NU  SG 

ROLE  SUBJ 
AUX  TNS  PRESENT 
VOICE  ACTIVE 
VP  V WANT 

NP  S TOCOMP 
NP  DET 
PRO  I 

FEATS  NU  SG 

ROLE  SUBJ 
AUX  TNS  NIL 

VOICE  ACTIVE 
VP  V GO 


SENTENCE:  (SCHEDULE  JOHN  A TRIP  TO  CALIFORNIA) 
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194  CONFIGS,  179  TRANS 
S IMP 
NP  DET 

PRO  YOU 
FEATS  NU  SG 
AUX  TNS  PRESENT 
VOICE  ACTIVE 
VP  V SCHEDULE 
NP  DET  ART  A 
N TRIP 
FEATS  NU  SG 
PP  PREP  FOR 
NP  DET 

NPR  JOHN 
FEATS  NU  SG 
PP  PREP  TO 
NP  DET 

NPR  CALIFORNIA 
FEATS  NU  SG 


S IMP 
NP  DET 

PRO  YOU 
FEATS  NU  SG 
AUX  TNS  PRESENT 
VOICE  ACTIVE 
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VP  V SCHEDULE 
NP  DET  ART  A 
N TRIP 
PP  PREP  TO 
NP  DET 

NPR  CALIFORNIA 
FEATS  NU  SG 
FEATS  NU  SG 
PP  PREP  FOR 
NP  DET 

NPR  JOHN 
FEATS  NU  SG 

( *Two  oarsings  are  found  in  parallel,  with  the  ambiguity 
to  be  resolved  later  by  Semantics.) 


SENTENCE:  (WHICH  IS  THE  BIG  -EST  TRIP) 
88  CONFIGS,  79  TRANS 

s 0 

NP  DET  ART  THE 
BIG 

ADJ  SUPERLATIVE 
N TRIP 
FEATS  NU  SG 
AUX  TNS  PRESENT 
VOICE  ACTIVE 
VP  V BE 
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NP  N WHQ 

PEATS  NU  SG 


SENTENCE:  (WHICH  CONFERENCE  IS  THE  BIG  -EST) 
99  CONFIGS,  80  TRANS 
S Q 

NP  DET  ART  THE 
BIG 

ADJ  SUPERLATIVE 
PRO  ONE 
FEATS  NU  SG 
AUX  TNS  PRESENT 
VOICE  ACTIVE 
VP  V BE 

NP  DET  WHICH*} 

N CONFERENCE 
FEATS  NU  SG 


SENTENCE:  (I  WANT  YOU  TO  CANCEL  THAT  TRIP) 
152  CONFIGS,  149  TRANS 
S DCL 
NP  DET 
PRO  I 

FEATS  NrJ  SG 

ROLE  SUB J 
AUX  TNS  PRESENT 
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VOICE  ACTIVE 
VP  V WANT 

NP  S TOCOMP 
NP  DET 

PRO  YOU 

FEATS  NU  SG/PL 
AUX  TNS  NIL 

VOICE  ACTIVE 
VP  V CANCEL 

NP  DET  ART  THAT 
N TRIP 
FEATS  NU  SG 


SENTENCE:  (THE  TRIP  WAS  TAKEN  BY  BILL) 
131  CONFIGS,  104  TRANS 
S DCL 

NP  DET  ART  THE 
N TRIP 
FEATS  NU  SG 
AUX  TNS  PAST 

VOICE  PASSIVE 
VP  V TAKE 

PP  PREP  BY 
NP  DET 

NPR  BILL 
FEATS  NU  SG 
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SENTENCE:  (TWENTY  ONE  PEOPLE) 
42  CONFIGS,  37  TRANS 
S NPU 

NP  DET  POSTART  INTEGER  21 
N PERSON 
FEATS  NU  PL 


SENTENCE:  (I  HAVE  ARRANGE  -D  FOR  JOHN  TO  GO) 
140  CONFIGS,  126  TRANS 
S DCL 
NP  DET 
PRO  I 

FEATS  NU  SG 

ROLE  SUBJ 
AUX  TNS  PRESENT 
PERFECT 
VOICE  ACTIVE 
VP  V ARRANGE 
NP  S FORCOMP 
NP  DET 

NPR  JOHN 
FEATS  NU  SG 
AUX  TNS  NIL 

VOICE  ACTIVE 
VP  V GO 
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